Skip to content

fix: agent loses context and halts after first session compaction#3042

Merged
rumpl merged 2 commits into
docker:mainfrom
rumpl:fix/compaction-context-loss
Jun 9, 2026
Merged

fix: agent loses context and halts after first session compaction#3042
rumpl merged 2 commits into
docker:mainfrom
rumpl:fix/compaction-context-loss

Conversation

@rumpl

@rumpl rumpl commented Jun 9, 2026

Copy link
Copy Markdown
Member

Fixes #2871

Problem

After the first session compaction in a multi-agent run, the agent halts mid-task and replies as if it has no conversation history ("I understand you're looking for a session summary, but ... no previous conversation history visible").

Root cause

Two compounding bugs, surfaced by ead9745 / 8dba51f (2026-05-18) which expanded when compaction activates — four days before the issue was filed:

  1. Phantom trigger in multi-agent runs: compactIfNeeded estimated newly-added tokens via sess.GetAllMessages(), which recurses into sub-sessions. The content produced by a transfer_task child was attributed to the parent session even though it never enters the parent's prompt (GetMessages skips sub-session items). The phantom tokens triggered compaction of a parent conversation that was actually tiny; with everything fitting the keep budget, the split resolved to the "compact everything, keep nothing" sentinel — wiping the user's task and the in-flight tool exchange. The agent's next prompt was literally just Session Summary: ..., which models read as the user asking for a summary. This also explains the "first compaction only" symptom: the first compaction fires while the parent history is still tiny; after re-prompting, later compactions keep a real tail.

  2. Fixed budgets break small context windows: MaxSummaryTokens (16k) and maxKeepTokens (20k) are absolute constants. For models whose window resolves from provider_opts.context_size and is ≤ ~16k, the summarizer's input budget went to zero — it received only its own prompts, fabricated a "no history" non-summary, and that text replaced the entire session history.

Fix

  • Session.OwnMessages() (no sub-session recursion) now drives the compaction trigger's token accounting, so sub-agent work no longer causes phantom parent compactions.
  • Summary/keep budgets scale with the window (min(16k, limit/4) / min(20k, limit/5)); the scaled cap is also used for the summary call's max_tokens.
  • Safety net: RunLLM no-ops when no conversation message fits the summarization budget, instead of running the summarizer on an empty conversation and wiping history with the result.
  • ComputeFirstKeptEntry gains a contextLimit parameter so hook-supplied summaries share the same kept-tail policy.

Tests

  • TestCompactIfNeeded_IgnoresSubSessionTokens — regression test, verified to fail against the old trigger code.
  • TestCompactIfNeeded_TriggersOnOwnMessages — large own tool results still trigger.
  • TestRunLLM_SmallContextWindow — summarizer receives real conversation on an 8k window and a tail is kept.
  • TestRunLLM_NoConversationFits_NoOps — empty summarizer input no-ops instead of wiping history.

task build, task test, task lint all pass (only pre-existing, environment-dependent pkg/sandbox.TestExtraWorkspace failure, which also fails on clean main).

Note for reviewers

One residual hazard left untouched (documented contract defended in 1e9512e): a legitimately triggered compaction whose whole conversation fits the keep budget (possible with image-heavy histories — token estimates ignore images) still drops the tail via the "compact everything" sentinel. Happy to follow up with a "keep the last user turn on threshold/overflow compaction" change if desired.

Assisted-By: docker-agent

rumpl added 2 commits June 9, 2026 21:30
compactIfNeeded estimated the token impact of newly added messages via
sess.GetAllMessages(), which recurses into sub-sessions. In multi-agent
runs the content produced by a transfer_task child was therefore
attributed to the parent session even though it never enters the
parent's prompt (GetMessages skips sub-session items).

The phantom tokens triggered a compaction of a parent conversation that
was actually tiny; with everything fitting the keep budget the split
resolved to the 'compact everything, keep nothing' sentinel, so the
user's task and the in-flight tool exchange were wiped. The agent's next
prompt was literally just 'Session Summary: ...', which models read as
the user asking for a summary and answer with a confused 'I see no
conversation history' reply, halting mid-task.

Add Session.OwnMessages() (direct messages only, no sub-session
recursion) and use it for the trigger's before/after counts so the
estimate matches what the session actually sends.

Fixes docker#2871

Assisted-By: docker-agent
Signed-off-by: Djordje Lukic <djordje.lukic@docker.com>
The compactor used fixed absolute budgets: MaxSummaryTokens (16k) was
subtracted from the window when sizing the summarizer's input, and
maxKeepTokens (20k) sized the verbatim-kept tail. Since ead9745 made
compaction activate for models whose window resolves from
provider_opts.context_size, both constants can exceed the entire
window: contextAvailable went to zero, FirstIndexInBudget dropped every
conversation message, and the summarizer received only its own prompts.
It then fabricated an 'I see no conversation history' non-summary that
replaced the real session history.

Scale both budgets to the window (min(16k, limit/4) for the summary
cap, min(20k, limit/5) for the kept tail) so the kept tail plus the
summary always land well under the compaction threshold, and use the
scaled cap for the summary call's max_tokens so small-window providers
don't reject the request.

As a safety net, RunLLM now no-ops when not a single conversation
message fits the summarization budget (e.g. one giant tool result)
instead of running the summarizer on an empty conversation and wiping
the history with the result.

ComputeFirstKeptEntry gains a contextLimit parameter so hook-supplied
summaries share the same kept-tail policy; a non-positive limit falls
back to the unscaled budget.

Related to docker#2871

Assisted-By: docker-agent
Signed-off-by: Djordje Lukic <djordje.lukic@docker.com>
@rumpl rumpl requested a review from a team as a code owner June 9, 2026 19:31

@docker-agent docker-agent left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assessment: 🟢 APPROVE

This PR correctly addresses two compounding compaction bugs:

  1. Phantom token trigger — switching from GetAllMessages() (which recurses into sub-sessions) to OwnMessages() (which does not) ensures sub-agent token counts no longer falsely trigger parent-session compaction.
  2. Fixed budget overflow — scaling MaxSummaryTokens and maxKeepTokens proportionally to the context window (limit/4 and limit/5) prevents the summarizer from consuming the entire budget on small-window models, and the len(messages) <= 2 no-op guard correctly prevents a fabricated non-summary from replacing real session history.

Verification summary:

  • The ApplyCompaction path only appends to s.Messages, so the sess.OwnMessages()[messageCountBefore:] slice in compactIfNeeded cannot panic (length is monotonically non-decreasing in a single-goroutine call chain).
  • OwnMessages() excluding system-role items is intentional and consistent with GetAllMessages(); the invariant system messages in GetMessages() are built dynamically and were never stored in session items.
  • All four new tests (TestCompactIfNeeded_IgnoresSubSessionTokens, TestCompactIfNeeded_TriggersOnOwnMessages, TestRunLLM_SmallContextWindow, TestRunLLM_NoConversationFits_NoOps) directly target the described regression scenarios.

No confirmed or likely bugs found in the changed code.

@rumpl rumpl merged commit 4af658c into docker:main Jun 9, 2026
8 checks passed
aheritier added a commit that referenced this pull request Jun 10, 2026
…windows

After the fix in #3042, the summary and keep-tail token budgets used during
session compaction scale proportionally to provider_opts.context_size instead
of using absolute 16k/20k constants. Small-context-window models (≤ ~16k)
no longer have their history wiped during compaction.

Ref: #3042
@aheritier aheritier added area/agent For work that has to do with the general agent loop/agentic features of the app area/sessions For features/issues/fixes related to session lifecycle (resume, persistence, export) kind/fix PR fixes a bug (maps to fix: commit prefix) labels Jun 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/agent For work that has to do with the general agent loop/agentic features of the app area/sessions For features/issues/fixes related to session lifecycle (resume, persistence, export) kind/fix PR fixes a bug (maps to fix: commit prefix)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Agent loses context and halts after first session compaction

4 participants